RESEARCH 


1.) | 
| | 
ivity i i i Check for 

activity in their respective gut systems. Furl akties 


ILLUSTRATION: DAPHNE PERLMAN 


RESEARCH ARTICLE SUMMARY 


MICROBIOTA 


Cryptic diversity of cellulose-degrading 
gut bacteria in industrialized humans 


Sarah Morais, Sarah Winkler, Alvah Zorea, Liron Levin, Falk S. P. Nagies, Nils Kapust, Eva Lamed, 
Avital Artan-Furman, David N. Bolam, Madhav P. Yadav, Edward A. Bayer, 


William F. Martin, Itzhak Mizrahi* 


INTRODUCTION: Mammals, including humans, 
rely on their gut’s microbial community to break 
down plant cell wall components, notably cel- 
lulose and associated polysaccharides. However, 
there is limited evidence for cellulose fermenta- 
tion in the human gut despite the benefits of 
cellulose-containing dietary fiber for gut-micro- 
biome health and overall human well-being. 


RATIONALE: By investigating the presence of 
heretofore undescribed bacterial species within 
the human-gut microbiota that degrade com- 
plex cellulosic polysaccharides, we can reveal 
their potential sources and understand their 
intricate adaptations to diverse host lifestyles 
and diets. Insight into the prevalence and abun- 
dance of these bacteria across diverse mam- 
malian species and a wide range of human 
populations will provide critical knowledge of 
their evolutionary origins, ancestral associations, 
and trajectories that enabled their incorporation 
into the human gut. 


RESULTS : Previously unknown ruminococcal 
species were discovered in the human-gut mi- 
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crobiota and provisionally named Candidatus 
Ruminococcus primaciens, Ruminococcus 
hominiciens, and Ruminococcus ruminiciens, all 
of which assemble functional multienzymatic 
cellulosome systems that degrade crystalline 
cellulose. These species are prevalent among 
the great apes and other nonhuman primates, 
ancient human societies, hunter-gatherer com- 
munities, and rural populations. Although 
widespread geographically they are conspi- 
cuously rare within industrialized societies. 
Notably, they exhibit distinct host preferences 
wherein R. hominiciens is associated primarily 
with humans and great apes and R. primaciens 
predominantly inhabits the gut of nonhuman 
primates and ancient human populations. 
Moreover, these species display host-specific 
diversification, forming distinct clades within 
the phylogenetic tree and aligning with their 
respective hosts. Our evolutionary analysis 
strongly suggests that R. hominiciens likely 
originated in the ruminant gut and later trans- 
ferred to humans, possibly during domestica- 
tion. High gene expression levels were observed 
for these species, reflecting their considerable 
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Cellulose degrading gut bacteria of hominids across evolutionary time. Previously unknown human 
gut cellulolytic ruminococcal species are highly prevalent in nonhuman primates, the great apes, ancient 
human populations, hunter-gatherer communities, and in rural populations but are rare in urbanized 


human populations. 
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more, their gene expression profile aligns \..- 
their hosts’ dietary preferences, highlighting 
their adaptability. Our analyses show that these 
novel species adapt to their host ecosystems 
by acquiring genes from co-resident gut mi- 
crobes. The human-associated strains possess 
functional adaptability highlighted by the ac- 
quisition of genes that can degrade specific 
plant fibers of monocots such as maize, rice, and 
wheat—major components of the human diet. 
Likewise, the nonhuman primate-associated 
strain exhibits the potential for degrading chitin, 
a polymer abundant in the insect exoskeleton, 
part of the diet of nonhuman primates. Our 
data provide insight into the ongoing coloni- 
zation of these species within the human gut, 
particularly those originating from ruminants 
and nonhuman primates. Specific strains appear 
to represent intermediates between primate- 
and rumen-gut ecosystems, as evidenced by 
their gene content during establishment in the 
human intestine. 


CONCLUSION: Our accumulated data indicate 
that ruminococcal lineages were more wide- 
spread in the past, evidenced by the high prev- 
alence and abundance of these strains in 
ancient human populations and among hun- 
ter gatherer communities and rural societies, 
combined with their global distribution and 
low prevalence in industrialized societies. Dif- 
ferences in their prevalence among human 
populations may reflect dietary variation be- 
tween industrialized and nonindustrialized 
societies. Dietary fiber intake appears to be 
a key factor as high-fiber diets are reported 
among Hadza hunter-gatherers whereas lower 
fiber intake is observed in rural populations 
and the least consumption of fiber occurs in 
industrialized societies. These findings col- 
lectively imply a decline of these species in 
the human gut, likely influenced by the 
shift toward westernized lifestyles, poten- 
tially impacting energy balance and other 
health-related aspects. The presence of 
transitional strains that recently colonized 
the human gut indicates that ruminants 
and nonhuman primates could be a source 
and reservoir for cellulosome-producing rumi- 
nococcal strains, which continue to colonize 
and adapt to the human gut. There may be 
potential for intentional reintroduction or 
enrichment of these species in the human 
gut through targeted dietary approaches and 
specialized probiotics. 
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Humans, like all mammals, depend on the gut microbiome for digestion of cellulose, the main component 
of plant fiber. However, evidence for cellulose fermentation in the human gut is scarce. We have 
identified ruminococcal species in the gut microbiota of human populations that assemble functional 
multienzymatic cellulosome structures capable of degrading plant cell wall polysaccharides. One of these 
species, which is strongly associated with humans, likely originated in the ruminant gut and was 
subsequently transferred to the human gut, potentially during domestication where it underwent 
diversification and diet-related adaptation through the acquisition of genes from other gut microbes. 
Collectively, these species are abundant and widespread among ancient humans, hunter-gatherers, and 
rural populations but are rare in populations from industrialized societies thus indicating potential 
disappearance in response to the westernized lifestyle. 


ietary fiber is beneficial to gut microbiome 

stability and richness and has important 

implications for human health (J). Fer- 

mentation of dietary fiber in the human 

gut regulates digestive transit, prevents 
obesity and diabetes, and reduces cardiovascular 
diseases and cancer (J). Microbial activity trans- 
forms these indigestible glycans into short-chain 
fatty acids which supply energy to the host and 
have multiple effects not only on the gut but 
also systemically (2). Cellulose is a major part 
of the plant cell wall (3) and consequently a 
common component of diets that include plant- 
based components. The benefits of cellulose on 
host health have been shown in animals and 
include prevention of colon cancer (4) and re- 
duction in blood sugar levels (5). The preva- 
lence of cellulose in processed food is very low 
but there is a growing preference to decrease 
the amount of processed food ingredients in 
favor of a plant-based diet with increased fiber 
levels. 
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It was long believed that crystalline cellulose 
was not digested in the human gut, in contrast 
to ruminants and other herbivores (6, 7). Evi- 
dence for the degradation of microcrystalline 
cellulose—the purified crystalline cellulose por- 
tion from cellulose fibers—by human gut bacteria 
was first reported in 2003 (8) and the micro- 
crystalline cellulose degrader Ruminococcus 
champanellensis was isolated a decade later 
(9). Subsequently, the presence of cellulosomes— 
multi-enzymatic complexes that degrade plant- 
fiber polysaccharides—were detected in this 
bacterium. Biochemical characterization of 
its interactive cellulosomal proteins and en- 
zymes confirmed its full functionality (10-12). 
Despite this discovery, cellulose degradation 
and fermentation in the human gut is rare or 
absent in most humans (13, /4). Nevertheless, 
the presence of cellulosomes across gut eco- 
systems indicates that they play a distinct role 
in promoting energy release from dietary fiber. 

Despite considerable progress, fundamental 
questions remain concerning the prevalence 
of cellulosome-producing bacterial species in the 
mammalian gut, their adaptability to host life- 
style and diet, and whether other undiscovered 
cellulosome-producing bacterial species reside 
in the human gut. In this study we aimed to 
address these questions. We used the human 
strain R. champanellensis and the related ramen 
species Ruminococcus flavefaciens as reference 
cellulosome-producing bacterial species (5, 6, 9) 
to identify related species by searching for key 
cellulosome genes in metagenome-assembled 
genomes. We examined the functionality of the 
cellulosomes in the species we discovered, how 
these functions are rooted within these bacte- 
rial lineages, their connection to their respec- 
tive host lifestyles and diets, and the dynamics 
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of their evolutionary trajectory from our pri- 
mate relatives to diverse human cultures. 

This group of human gut bacteria produce 
functional cellulosomes, are phylogenetically 
related to the rumen-based R. flavefaciens, and 
are prevalent in several nonhuman primate 
(NHP) lineages. We found that these bac- 
teria have diversified within their various host 
ecosystems and have adapted to their lifestyles 
by acquiring genes from their surrounding mi- 
crobial communities. These cellulosome-carrying 
species are prevalent at low incidence in wes- 
ternized human populations but at higher levels 
in ancient human, hunter-gatherer, and non- 
westernized societies. Our data also indicate 
that strains of these ruminococcal species are 
continuing to colonize the human gut from 
NHPs and ruminants and are dynamically ad- 
apting to the human gut ecosystem. 


Results 
Detection of fiber-degrading species in the 
human gut microbiome 


By identifying known cellulosomal components 
in genomes of Ruminococcus spp., we aimed 
to determine the breadth of the diversity of 
human gut cellulosome-producing species. 
Cellulosome complexes are heterogeneous mod- 
ular assemblies of structural proteins (scaffol- 
dins) and enzyme arrays that target different 
recalcitrant plant fiber components (Fig. 1A). 
The cellulosome complex is composed of mul- 
tiple scaffoldins that contain a multiplicity of 
cohesin modules each of which interacts with 
a complementary dockerin module located on 
each of the cellulosomal enzyme components 
(Fig. 1A). 

To retrieve and analyze cellulosome-producing 
ruminococcal genomes we used the scaC gene 
that encodes a definitive cellulosomal scaffoldin 
protein and that so far is known only in the 
Ruminococcus genus (15, 16) (Fig. 1A). Using 
this approach we searched for ScaC sequences 
in 4941 rumen metagenome-assembled genomes 
(MAGs) from domesticated ruminant cattle and 
92,143 human MAGs (17, 18), and identified 
251 ruminococcal genomes that contain ScaC. 
After filtering genomes exhibiting at least 90% 
completion as determined by CheckM (19), 
we obtained 25 and 22 genomes of rumen and 
human origin, respectively (table S1). Maximum 
likelihood phylogenetic analysis of their ScaC 
sequences revealed a clustering pattern that 
almost completely distinguishes between hu- 
man and rumen clades of ruminococcal ge- 
nomes. This analysis was augmented by 
ScaC sequences of 12 sequenced genomes of R. 
flavefaciens isolates from the rumen environ- 
ment and three sequenced genomes from 
isolates of their close relative from the human 
gut, R. champanellensis. 

To deepen the phylogenetic analysis we fur- 
ther examined the fibrolytic potential of these 
62 genomes by searching for the presence of 
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Fig. 1. Detection of a human-gut, fiber-degrading ruminococcal species. 
(A) Scheme of cellulosome architecture. The CttA protein by virtue of its 
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scaffoldin assemblies. (B) Unrooted phylogenetic tree computed with the 
maximum likelihood method of 62 selected genomes and MAGs using the 
sequence of the ScaC scaffoldin illustrated in Fig. 1A as a phylotyping marker 
(15, 16) (table S1). The color of the clade indicates the origin of the genomic 


bin (light b 


ue, human; light green, rumen). Light purple ci 


rcles on the 


branches represent bootstrap values higher than 60%. The number and 
composition of cellulosomal elements is indicated as a bar for each genomic 
bin (number of dockerin-containing proteins with additional CAZyme elements, 
dark gray; number of dockerin-containing proteins with no additional CAZyme 
elements, medium gray; number of scaffoldins containing at least one cohesin 
module, light gray). Brown circles next to the MAG name indicate genomes 
containing a cttA gene. (C) Genomic dissimilarity computed by Mash distance 
within the identified ruminococcal cellulosomal species and pairwise comparisons 
to each other as well as to the ruminal R. flavefaciens species and the human 
species R. champanellensis. 


cellulosomal elements and CAZymes (i.e., 
carbohydrate-active enzymes that act on gly- 
cosidic bonds) (20). We sought to identify the 
potential of enzyme components that integrate 
into cellulosome complexes, which would be 
detected by the presence of a dockerin mo- 
dule on the enzyme. We thus identified a total 
of 3687 dockerin-containing proteins among 
which 1853 also contained a CAZyme module 
(Fig. 1B), including glycoside hydrolases (GH), 
carbohydrate esterases, polysaccharide lyases, 
and carbohydrate-binding modules (CBMs) 
from various families. In addition, a total of 
308 scaffoldins were recovered. The phylo- 
genetic clusters of the tree corresponded to the 
distribution of the functional cellulosomal 
components of the identified MAGs. The human- 
associated MAGs were separated into four 
distinct clades (bootstrap values higher than 
90%) (Fig. 1B): two exhibited low numbers of 
cellulosomal elements (designated as Rumi- 
nococcus sp. 1 and Ruminococcus sp. 2 in the 
figure) whereas the remaining two exhibited 
high numbers of cellulosomal elements. The 
two latter clades were examined further and 


one was found to comprise sequences from 
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R. champanellensis. Notably, the second con- 
tained ScaC sequences that were phylogeneti- 
cally closer to those of the R. flavefaciens 
rumen isolate genomes (bootstrap value of 60%). 
The latter genomes also contained a cttA gene 
marker characteristic of the R. flavefaciens 
scaffoldin gene cluster. CttA is a cellulosomal 
protein that binds the bacterium to cellulose 
(Fig. 1A) (21). The gene for this cellulosome 
component represents a marker specific to 
R. flavefaciens that is absent from the human 
gut bacterium R. champanellensis. The cttA 
gene can therefore be used specifically to 
distinguish between the two closely related 
cellulosome-producing species. Consequently, 
members of the clade that encode the citA gene 
and occur in the human gut potentially repre- 
sent additional human gut fiber-degrading 
cellulosomal species. We found an average of 
>99% similarity among this clade to each other 
but only 78% similarity to the genomes of iso- 
lates and MAGs affiliated with the rumen 
R. flavefaciens (Fig. 1C) (22). In addition, we 
retrieved the 16S-rRNA gene sequence of four 
of the six MAGs which were found to show 
an average of 95.8 and 92.7% identity to the 
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rumen R. flavefaciens and human R. cham- 
panellensis species, respectively, and 100% 
identity to each other (table $2). This finding 
supported their potential association as a dis- 
tinct ruminococcal species, which we registered 
as ‘Candidatus “Ruminococcus hominiciens” 
sp. nov.’ in the SeqCode registry (23). 

Two MAGs of human origin that also encoded 
the cttA gene marker and numerous celluloso- 
mal elements were not located within the 
R. hominiciens clade. Our data for genome sim- 
ilarity and marker genes (specified below) showed 
that these MAGs may also represent distinct 
cellulosome-producing bacterial species occupy- 
ing similar niches to R. hominiciens (Fig. 1B). 
One MAG was positioned within the rumen- 
associated MAG clade, and the second appeared 
as a single isolated branch of the phylogenetic 
tree. The 16S-rRNA sequence of the former MAG 
was not available but it exhibited low average 
genome similarities to the R. hominiciens (80%) 
and R. flavefaciens genomes (75.6%) (Fig. 1C, 
green background, and table S2). The latter 
MAG also exhibited low genome similarity to 
the R. hominiciens and R. flavefaciens strains, 
71 and 77.3%, respectively (Fig. 1C, orange 
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background), and its 16S-rRNA sequence ex- 
hibited relatively low identity to the latter strains 
as well (90.6 and 91.3%, respectively). These data 
suggest that the strains are distinct species and 
thus were provisionally named with the SeqCode 
registry. The human-associated MAG that was 
positioned within the rumen clade was named 
‘Candidatus “Ruminococcus ruminiciens” sp. nov.’ 
and the other human-associated MAG that ap- 
peared as a single branch on the phylogenetic 
tree was named ‘Candidatus “Ruminococcus 
primaciens” sp. nov.’ In addition, Protologger 
analysis (24) of the R. ruminiciens, R. primaciens 
and R. hominiciens genomes indicated that these 
are species with potential for cellulose and starch 
utilization as well as acetate, propionate, and 
L-glutamate production, similar to that of 
R. flavefaciens (strain FD-1). 


Fiber-degrading bacterial species prevalence 
in nonindustrialized humans 


The prevalence and abundance of the fiber- 
degrading species and known ruminococcal 
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Fig. 2. Ruminococcus spp. are abundant in ancient human, hunter- 
gatherer, and rural populations. (A) Observed collective prevalence of the 
MAGs for fiber-degrading strains in various human, ape, and NHP cohorts. Pie 
charts represent the observed prevalences. (B) Worldwide locations of positive 
human and NHP samples. The locations of the samples in which the human 
MAGs were detected are denoted on the map as circles: dark blue, industrialized 
societies; light blue, rural societies and hunter-gatherers; green, paleofeces; 
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species, R. flavefaciens and R. champanellensis, 
were investigated across 1989 gut samples of 
humans and animal species worldwide (Fig. 2A, 
fig. S1, and table S3). The samples originated 
from 75 animal species, including wild and 
domesticated animals (NHP and ruminants), 
as well as various human cohorts. This an- 
alysis revealed that the human-associated geno- 
types (R. primaciens, R. hominiciens, and 
R. ruminiciens) are broadly distributed (Fig. 2B) 
and are specific to humans and several NHP 
species (i.e., macaques, baboons, gorillas, and 
chimpanzees), but absent from the ruminant 
samples tested (see figs. $1, S2, and S3). In ad- 
dition, the ramen MAGs were specific to rumi- 
nants but absent from the human and NHP 
cohorts tested (figs. S1, S2, and S3). 

The prevalence and abundance of R. 
primaciens, R. hominiciens, and R. ruminiciens 
displayed notable variations among diverse 
human cohorts. In industrialized countries, 
including Denmark, China, Sweden, and the 
USA, the collective prevalence of these strains 
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reached a maximum of 4.6% (Fig. 2A) with 
some notable differences in R. hominiciens 
prevalence between these countries (fig. S4). All 
three strains exhibited higher collective prev- 
alence in the different cohorts of the non- 
industrialized populations we tested: 43% 
prevalence in human paleofeces samples dating 
from 1000 to 2000 years ago (25), 21% in hunter- 
gatherers, and 20% in geographically diverse 
rural societies (with no significant differences 
among geographies, fig. S5). Samples from apes 
and other NHPs had 41% and 33% prevalence, 
respectively (Fig. 2A and fig. S2, A and B). 
Furthermore, the abundance of these strains 
in each positive individual was significantly 
lower in industrialized populations when com- 
pared with all nonindustrialized human popu- 
lations, as well as in apes and other NHP 
samples (fig. S2C). The rumen strain was more 
abundant in ruminants than human strains 
for both human and NHPs samples (fig. S6). The 
variations in prevalence of these species in 
human populations could potentially be linked 
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and pink, wild NHP. (C) Distribution of fibrolytic strains in human and NHP 
populations. (i) Stacked bar chart of the distribution of each human cellulosomal 
strain (R. champanellensis, R. hominiciens, R. ruminiciens, and R. primaciens) 
across the sample cohorts. (ii) Heatmap of the distribution of the human 
cellulosomal strains among the human- and NHP-positive samples. The 

bar plot above the heatmap represents the number of strains detected in 
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to dietary disparities between individuals in 
industrialized and nonindustrialized societies 
(26, 27), as well as human activities that affect 
microbial diversity such as the use of antibiotics 
(28). Dietary fiber intake may be a major con- 
tributing factor given its close association with 
the prevalence and abundance of these species. 
Notably, adult Hadza hunter-gatherers typically 
consume 80 to 150 g per day (30) of dietary fiber 
whereas rural populations have substantially 
lower estimates at 13 to 14 g per day (31, 32), 
and industrialized populations even less at 
8.4 g per day. Moreover, the prevalence of 
R. hominiciens strains in wild versus captive 
apes was significantly lower in prevalence in 
captive animals further strengthening the con- 
nection between lifestyle and diet on the pre- 
valence of these strains (fig. S7). In other NHP 
samples R. primaciens was more prevalent in 
omnivorous than in folivorous monkeys, sug- 
gesting that the fiber content in these diets is 
sufficient and that other factors may also play 
a role (fig. S8). Furthermore, the high preva- 
lence and abundance of these strains in human 
samples dating back 1000 to 2000 years (25) 
and among hunter-gatherer populations, coupled 
with the global distribution of the human 
Ruminococcus spp. strains (Fig. 2B), suggests 
that although these lineages currently exist in 
limited proportions of human populations they 
were previously more widespread and abun- 
dant, consistent with a recent study that shows 
loss of taxa while humans speciated from great 
ape relatives and while switching from a non- 
industrialized to industrialized lifestyle (29). 
We found similar levels of prevalence for the 
fiber-degrading species and the previously iden- 
tified R. champanellensis cellulolytic strains in 
human gut samples (fig. S1), which led us to 
investigate the potential exclusion or cooper- 
ation processes that might drive the distribu- 
tion of these species and strains. Analysis of 
the strain distribution of Rwminococcus spp. 
revealed that when fiber intake is high, as in 
nonindustrial countries, strain diversity increases 
whereas in most of the samples originating from 
humans of industrial countries, only one fibro- 
lytic strain was detected, indicating potential 
competitive exclusion among these species 
when fiber intake is low (Fig. 2Cii). An alter- 
native scenario to exclusion would be the sto- 
chastic effects of loss due to antimicrobial 
selection in industrialized countries. By contrast, 
human samples from either hunter-gatherer 
societies or nonindustrialized countries as 
well as apes and other NHP samples exhib- 
ited various combinations of two or more spe- 
cies of Ruminococcus spp., which suggests 
reduced competition possibly attributable to 
greater access to fiber-rich diets and/or increased 
niche availability. Niche availability such as car- 
bohydrate diversity may enable niche partition- 
ing among strains through variations in glycolytic 
hydrolysis-coding genes present in their genomes, 
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ultimately leading to a higher diversity of 
fibrolytic strains in these samples (Fig. 2Ci). 
The examination of different strains’ preva- 
lence and abundance within individual hosts 
allowed us to also investigate host-strain asso- 
ciations. Our findings provided evidence of 
distinct host preferences among the various strain 
lineages. Specifically, R. primaciens exhibited 
a significant association with other NHPs and 
ancient humans (indval test P-value = 0.01) 
whereas R. hominiciens is significantly asso- 
ciated with humans and apes (indval test 
P-value = 0.005; Fig. 2Cii). Furthermore, R. 
ruminiciens—characterized by its higher simi- 
larity to the rumen strains (see below)—was 
found to be rare in all samples (Fig. 2Ci). 


Ongoing colonization by ruminococci 
in the human gut 


We studied the potential evolutionary scenarios 
for core proteins found in all genomes of the 
ruminococcal strains. Because we have also 
identified these strains in NHPs, we augmented 
our MAG set with eight additional MAGs 
originating from NHP-gut samples (30). The 
latter genomes were assembled with at least 
90% genome completion as analyzed by CheckM 
and are 98% similar to the R. primaciens strain. 
We predicted and clustered the overall open 
reading frames (ORFs) from the different 
strains’ genomes (14 rumen-, 8 human-, and 8 
NHP-associated MAGs) and clustered them 
into 5958 orthologous groups using the Protein- 
ortho program (37). The different host-associated 
strains shared a core genome composed of 315 
orthologous protein groups from which we 
generated maximum likelihood trees that were 
colored according to the samples in which the 
MAGs were assembled (fig. S9). In all of the trees 
the proteins were clustered according to the re- 
spective host (Fig. 3, A and B)—that is, human 
NHP and ruminant—suggesting within-host 
clonal diversification and potential speciation 
with the exception of two strains, one being a 
R. primaciens MAG and the second correspond- 
ing to a R. ruminiciens MAG, both assembled 
from human samples and thereby suggesting 
recent transfer from NHP and ruminants to 
the human gut (Fig. 3). 

In some cases, a cospeciation scenario em- 
erged, with host phylogeny significantly corre- 
lated with matching associated strains such as 
human hosts with human strains and primate 
strains with NHP hosts supported by both 
Mantel correlation and AU tests (Fig. 2A). How- 
ever, in most of the trees, as well as in both an 
multilocus sequence analysis (MLSA) tree and 
a concatenated tree comprising the majority 
of core genes (fig. S10, A and B), the human- 
associated strain clade was closer to the rumi- 
nant than NHP clade (Fig. 3B). 

In our ancestral analysis we traced the ori- 
gin of R. hominiciens strains back to their roots 
in ruminant strains supported by a significant 
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92% bootstrap split in the concatenated tree 
that places the human associated clade within 
the ruminant clade. This pattern remained con- 
sistent across the majority of individual trees 
within both evolutionary scenarios (see fig. 
S10). Phylogenetic analysis of the concaten- 
ated tree demonstrated that R. primaciens 
strains associated with NHPs exhibited a sig- 
nificantly shorter phylogenetic distance to the 
ancestor of all human strains when compared 
with that of the R. hominiciens strains (one- 
sided Wilcoxon rank-sum test P-value: 0.000174) 
(Fig. 3B). Collectively these findings strongly 
suggest that R. primaciens is the closest rel- 
ative to the ancestors of all human strains and 
that the ancestors of the R. hominiciens strains 
originated from ruminant strains. We can thus 
speculate that the transfer to humans occurred 
during the domestication process, with these 
strains subsequently adapting and diversifying 
within the human gut environment. 


Functional cellulosome fiber-degradation and 
cellulose-adhesion activities 


Our phylogenetic analyses identified R. 
primaciens strains as the closest to the an- 
cestor of human cellulolytic strains and in- 
dicated recent transfer of this species into the 
human gut (Fig. 3). This discovery provided the 
opportunity to investigate whether the ances- 
tral human gut R. primaciens strain can effi- 
ciently degrade crystalline cellulose and produce 
active assembled cellulosomes composed of 
components common to the other rumino- 
coccal strains. To this end, we identified both 
scaffoldins and enzymes that were shared among 
R. primaciens, R. hominiciens, R. ruminiciens, 
and R. flavefaciens (table S4). We examined 
their potential for cellulosome assembly using 
the matching fusion-protein approach (32) in 
which the binding abilities of the recombi- 
nant proteins—seven cohesin and six dockerin 
modules—were measured (table S5). Out of 
the 36 potential interactions tested, 10 positive 
interactions were thus identified which enabled 
us to predict the cellulosomal assembly of these 
modules (Fig. 4A and table S6). The proposed 
structure of the R. primaciens cellulosome 
(Fig. 4B) resembles the known R. flavefaciens 
cellulosomal organization in strains isolated 
from ruminants (33). In both R. primaciens 
and R. flavefaciens strains, the scaffoldin pro- 
teins show a similar interaction pattern whereby 
the dockerins of the ScaA and ScaC scaffoldins 
interact with the cohesins of ScaB through 
divergent cohesin-dockerin interactions (see 
Fig. 4A). The cellulosome is attached to the 
microbial cell wall through selective cohesin- 
dockerin interaction between ScaB and ScaE. 
Furthermore, the dockerin-containing enzymes 
interact with their cohesin counterparts of 
ScaA, ScaB, and ScaC with divergent speci- 
ficities. Finally, similar to ScaB, CttA is integ- 
rated into the bacterial cell wall by means of 
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Fig. 3. Colonization by ruminococci is ongoing and dynamic in the human gut. 
(A) Core protein phylogenetic tree illustrating the cospeciation hypothesis (left panel). 
Blue circles on the branches represent bootstrap values higher than 60%. The 
comparison with the phylogenetic tree of the mammalian host species is given on the 
right with red lines indicating proteins that do not recapitulate host phylogeny. 

(B) Phylogenetic tree of 197 concatenated core proteins. Blue circles on the branches 


a similar type of cohesin-dockerin interaction 
with ScaE. We measured the ability of cellu- 
losomal components from the two species as 
well as from R. champanellensis to interact 
with each other. We found cross-species inter- 
actions of cellulosomal components of R. 
primaciens with representative cohesin-dockerin 
combinations from R. champanellensis 18P13 
and R. flavefaciens FD-1, indicating evolution- 
ary conservation of the interaction residues and 
acertain degree of promiscuity among these 
components (tables S7, S8, and S9). 

We selected one of the GH5 cellulase en- 
zymes for biochemical characterization of its 
cellulolytic activity as this type of GH5 gene 
was common to 25 of the 30 MAGs used in our 
analyses. The GH5 enzyme exhibited cellulolytic 
activity on microcrystalline cellulose as a sub- 
strate (Fig. 4C and fig. S11) and its enzymatic 
activity was in a range similar to that of the 
R. flavefaciens FD-1 ortholog (68% sequence 
identity). We also purified the CttA protein 
from R. hominiciens and found that it exhib- 
ited robust binding to microcrystalline cellu- 
lose (Fig. 4D) indicating that the bacterial 
cells would bind to cellulose owing to the 
interaction with the cell-wall-anchored scaffol- 
din ScaE (see below and Fig. 4B) (34, 35). 
Altogether these results demonstrate that the 
cellulosomes of the ruminococcal strains are 
assembled and active on the crystalline cellu- 
lose substrate. 
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Specific host gut adaptation 

The phylogenetic clustering of R. hominiciens, 
R. primaciens, and R. ruminiciens strains 
according to their hosts (Figs. 1B and 3, A and 
B), along with the significant association of 
R. hominiciens to humans and apes and of 
R. primaciens to other NHPs and ancient 
humans, raise the question of whether host 
association is reflected in the coding capac- 
ity of the different strains. The genomes from 
the different host ecosystems (14 rumen, 8 
human, and 8 NHP MAGs) showed host spec- 
ificity in their gene content and expression 
pattern, in accordance with their respective 
host’s dietary preferences. 

Principal component analysis (PCA) of the 
5958 orthologous groups obtained earlier (Fig. 
5A and fig. S9) showed host specificity in the 
genome content of the ruminococci we iden- 
tified, yielding three distinct clusters corre- 
sponding to the different hosts (PERMANOVA 
P < 0.001), with the exception of two human 
assembled MAGs for R. primaciens and R. 
ruminiciens, which were located in the NHP 
clade and rumen clade, respectively (Fig. 5A). 
These results further support the notion that 
these strains represent a transitional adapta- 
tion stage. We further analyzed all strains for 
their core and flexible host-associated genomes 
to track the evolutionary trail that potentially 
brought about host adaptivity. Our analysis fa- 
vored gene acquisition from external lineages 
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represent bootstrap values higher than 77%. Blue highlighting on the right indicates 
a close phylogenetic distance between the human and ruminant clades. In (A) 

and (B) MAGs are color-coded according to host origin: green, blue, or pink indicate 
rumen, human, or NHP, respectively; transitional strains are denoted as “recent 
transfers” and the tree scales represent the number of amino acid substitutions per 
site. MAGs corresponding to Ruminococcus flavefaciens are indicated. 


as the more probable scenario that allowed 
these lineages to adapt to different hosts. 

The different host-associated strains shared 
a core genome composed of 315 orthologous 
groups common to the three species and a total 
of 233 host-specific orthologous groups that 
were found in all genomes of the given host- 
associated strains but not in the others (rumen, 
human, or NHP; fig. S9). We therefore asked to 
what degree the host-specific genes are rooted 
within the strain lineage as compared with the 
core genes. To this end we applied verticality 
analysis that measures the degree by which 
core and the host-specific genes are rooted 
within a strain phylogeny (36). While comparing 
verticality values for core proteins to those for 
host-specific orthologous groups, we found sig- 
nificantly higher values for the former (Fig. 5B). 
This finding indicated that host-specific genes 
were most probably gained by these strains 
from microbes that were coinhabiting the same 
specific host-associated gut environment whereas 
the core genes are endogenous to these strains 
and rooted within their lineage. 

The identified ruminococcal species are sus- 
pected to occupy the fiber-degrading niche 
within gut ecosystems and their prevalence 
correlates with the dietary fiber content of 
their hosts (Fig. 2, A and C). Hence, genome 
adaptivity to the host environment should also 
be apparent in the gene composition of the 
fiber-degrading functions. We therefore analyzed 
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Fig. 4. Cellulosome assembly activity and cellulose adhesion. (A) Summary of 
interactions between selected cellulosomal recombinant cohesin and dockerin modules 
derived from an R. primaciens strain (Human_SRR5958136_bin.38) compared with those 
of orthologous modules from the R. flavefaciens FD-1 rumen strain (79). Cohesin 

and dockerin modules are color-coded (red, yellow, or green) according to their 
predicted specificities of interaction. On both panels, light blue highlights negative 
interactions; darker blue, positive interactions; gray, not tested. On the left panel 
(R. primaciens), intensities of the interactions are denoted with — for no affinity, (OD4s0 
lower than 0.15), + for moderate affinity (OD450 between 0.15 and 0.5), ++ for high 
affinity (OD4sq between 0.5 and 1.0), and +++ for very high affinity (OD4s5q between 1.0 
and 2.2), respectively. On the right panel (R. flavefaciens), intensities were not available 
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for the Israeli-Ruimy 2017 study. (B) Overview of cellulosomal interactions in 

R. primaciens compared with those of R. flavefaciens as deduced from affinity- 
based ELISA experiments and proposed recognition residues of the dockerin 
components (table S6). (©) Comparative cellulolytic activity of ruminococcal GH5 
orthologs of either human (R. primaciens) or rumen origin (R. flavefaciens FD-1). Enzyme 
samples were examined using microcrystalline cellulose (Avicel) as the substrate at 
37°C. The data points represent the average of biological triplicates with standard 
deviation. (D) Cellulose binding assay. SDS-PAGE gels loaded with cellulose-bound 
(B) and -unbound (U) fractions of either R. hominiciens CttA, the CBM3a from the 
CipA scaffoldin of the Clostridium thermocellum cellulosome as a positive control or 
green fluorescent protein (GFP) as a negative control (nonbinding protein). 


6 of 11 


RESEARCH | 


RESEARCH ARTICLE 


A Strains’ gene content 


corresponding to 
host association 


B Vertical transmission 


of core and host- 
specific genes 


C Fibrolytic gene 
composition in strains 
by host association 


PERMANOVA p<0.001 
= @ R. flavefaciens 257) 
e 3 e R. primaciens iy 
e 20-4 
®e a 
Zoo re g 
% R. ruminiciens = 
& . 2 155 
"1 2 
8514 ri 
& 0.1 = 10-4 
@ Human s 
R. hominiciens 5 
024 y @ Monkey 
be) @ Rumen Pal 


00 ol 
Axis.1 [33.1%] 


E. Host-specific enrichment of functions 


Gene verticality analysis 


0.2 


| Cellulose GH9 
| Disaccharides GH3 
| GH2-doc 
| cls 
| GH31 
ped desis ] Hemicellulose GH98-doc 
GH98 
| GH141-do 
| GH16 
| Starch GH97-doc 
] Ulvan GH105 
| Chitin GH19 
15 12 9 6 3 0 


D Fibrolytic gene 
expression in strains 
by host association 


Kolmogorov-Smirnov R flavefaciens PERMANOVA p< 0.001 PERMANOVA p<0.001 %. 
Goodness-of-Fit Test 014 e a 
D=0.29, p<1.10-10 e 3 R. primaciens, o1 
a e R. ruminiciens % 
e e e ® 
j= 004 id e° $ = e 
g 7 E 
3 ! 2 3 0.0 4 ° 
ga ee . 
2 e? 3 ‘ 
® core proteins i i 
@ host-specific proteins O27 R. hominiciens 
e 
034° e 
Fctributi 0.2 “o.1 ' ; -014 -012 Ay oS 
Rank distribution wien (2.4%) 02 rer 
Specific gene content Specific gene expression 
GH9 
GH3 
GH2-doc 
GH38 a 
| 
GH31 
GH98-doc 
GH98 
is GH141-doc 
GH16 | 
GH97-doc 
GH105 i 
GH19 a 


Verticality value 


Fig. 5. Functional adaptation of MAGs with their host. In (A), (C), (D), and 
(E), MAGs and samples are color-coded according to host origin: green, blue, or 
pink indicating rumen, human, or NHP, respectively. (A) Principal component 
analysis (PCA) of the overall predicted ORFs of the MAGs, color-coded by their 
hosts (see below). Clustering analysis of MAG gene content according to their 
hosts was performed using the PERMANOVA test with 1000 randomizations 

of the data and the P-value is indicated. (B) Rank distribution of verticality values 
for core proteins across the three host types versus host-specific proteins 
indicates that specific genes are likely to be transferred through horizontal gene 
transfer within a given type of host. (C) PCA of the fibrolytic system [indicating 
glycoside hydrolase (GH) families] of the MAGs color-coded by their hosts. 
Clustering analysis of MAGs GH family content according to their hosts was 


the repertoire of fiber-degrading enzymes from 
these strains (table S7). We found that the 
glycoside hydrolase (GH) families coded by the 
different cellulosomal strains grouped into dis- 
tinct clusters on a PCA plot according to their 
host, further corroborating host adaptation for 
fiber degradation (Fig. 5C). 
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Metatranscriptome data of three samples 
from each host gut ecosystem was analyzed by 
read alignment to the strain’s genome and 
showed that these fiber-degrading genes are 
expressed within their host gut ecosystems 
pointing to high activity in the respective gut 
systems (fig. S12A). In all samples of the three 
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performed using PERMANOVA test with 1000 randomizations of the data and the 
p-value is indicated. (D) PCA of the expression of the fibrolytic system as 


analysis of three fecal samples of the three hosts 


(macaque, human, and sheep rumen). (E) Center panel: heatmap of the 
statistically significant GH families that distinguish the strains associated with 
the three gut ecosystems as determined by the Kruskal-Wallis test P <0.05 after 


graph represents the verticality values for each 


of these orthologous groups of genes. (Right) heatmap of the statistically 
significant GH expression (metatranscripts in FPKM) between the three types of 


ods section). For the GH141-Doc and GH97-Doc genes, 


the metatranscripts were aligned to Rumen_CADBJGO1 and Rumen_CACVQOO1 


hosts, expression of 50 to 82% of their overall 
gene content was observed (fig. S12A). When 
examining only cellulosomal genes, even higher 
ratios were obtained with more than 90% of 
cellulosomal gene expression of R. flavefaciens, 
R. hominiciens, and R. primaciens in sheep, 
humans, and NHPs (fig. $12, B to D). These 
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include a variety of key fibrolytic functions 
and specific cardinal cellulases (GH5, GH9, 
and GH48) and hemicellulases (GH10, GH11, 
and GH26) that are mutual to these strains 
(figs. S12 and S13). In addition, the amount 
and function of cellulosomal gene expression 
between the triplicate samples from the same 
ecosystem were almost identical, indicating 
the presence of a specific realized niche of fiber 
degradation for these bacteria within each of 
the host gut environments (fig. S12, B to D). 
Although high similarity exists at the cellulo- 
somal gene content and its expression level 
between the strains, the fine-tuned differences 
in gene presence and absence that are related 
to cellulosomal adaptation to the different eco- 
systems were also apparent at the expression 
profile (Fig. 5D). 

By analyzing the fiber-degrading gene reper- 
toire of the different species using the Kruskal 
Wallis test we highlighted specific GH families 
that statistically distinguish the strains associ- 
ated with the three gut ecosystems (Fig. 5E and 
table $3). These findings showed that within 
the different host gut ecosystems there are spe- 
cific host-related dietary components that trigger 
expression of these host-specific genes. For 
example, dockerin-containing GH families 2, 97, 
and 141 were only present in the R. flavefaciens 
rumen-associated strains and absent from the 
human- and NHP-gut R. hominiciens and 
R. primaciens genomes. These enzymes en- 
code various hemicellulolytic activities such as 
mannosidase, glucoamylase, and xylanase acti- 
vities, thus attesting to the richer spectrum of 
polysaccharides that exists in the rumen envi- 
ronment. Similarly, GH families coding for 
enzymes acting on cellulose (GH3 and GH9), 
mannans (GH31 and GH38), or arabinogalactan 
(GH105) were specific or present in higher 
numbers in both R. hominiciens and R. 
flavefaciens genomes and absent from R. 
primaciens genomes (table S7). In general, 
rumen-associated R. flavefaciens genotypes 
are richer in GH diversity and gene copy num- 
ber than the R. hominiciens genomes, both 
of which were richer compared with that of 
R. primaciens (table S7). Collectively, these 
differences could be related to the notion that 
rumen strains participate in the degradation 
of a major substrate critical to host survival 
and that the rumen system provides higher 
retention times whereas the human-based 
strains reside in the colon and deal with the 
undigested remnants of what has already passed 
through most the digestive tract with shorter 
retention time. 

Two GH families tightly connected to the host 
dietary constraints were found to be coded and 
expressed exclusively within host-associated 
strain: GH family 19, which includes putative 
chitinases and was exclusive to NHP-associated 
MAGs of R. primaciens, and GH family 98 
which includes arabinoxylanases and was 
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exclusive to R. hominiciens genomes. Notably, 
in the MAGs for which we hypothesize transi- 
tional stages of adaptation—i.e., the human- 
associated MAGs of R. primaciens and R. 
ruminiciens—the GH98 gene is either lacking 
or present in only one copy respectively (Fig. 
5E), which further suggests that these MAGs 
are in the process of adaptation to the human 
host. Likewise the GH19 gene is absent in the 
human MAG of R. primaciens which could 
suggest the loss of this function in human 
hosts. 

These host-exclusive functions could be ex- 
plained by host diet as the GH19 family found 
in R. primaciens genomes retrieved from NHP 
samples includes putative chitinases, which 
would presumably serve to degrade chitin of 
the insect exoskeleton ingested by the NHPs. 
GH98 enzymes found exclusively in the R. 
hominiciens genomes would potentially hydro- 
lyze glucuronoarabinoxylan, a hemicellulose 
that constitutes 25% of the primary cell walls 
of monocots such as rice, wheat, and maize, 
which are major components of the modern 
human diet (37). To test this we further cloned 
and purified the putative GH98 enzyme of 
R. hominiciens and measured its ability to de- 
grade corn glucuronoarabinoxylan as a model 
substrate (fig. $14 and Fig. 5E, left) thus con- 
firming the potential role of GH98 in the adap- 
tation of the human-associated R. hominiciens 
strain to the host diet. 

Like other host-specific genes, these host- 
exclusive functions all have extremely low ver- 
ticality values (0.004, 0.86, and 2.61 for GH98, 
GH98-Doc, and GH19, respectively), which 
suggests potential transmission to the human 
and NHP strains through horizontal gene trans- 
fer from the respective gut ecosystem (Fig. 5E, 
left graph). Indeed, the putative GH98 cata- 
lytic modules exhibited 44% sequence identity 
to the GH98 enzyme of Bacteroides ovatus, 
which was characterized as a glucuronoar- 
abinoxylanase and potentially could be ac- 
quired from this lineage (38). 


Discussion 


We have identified three distinct, heretofore 
undescribed, cellulosome-producing, cellulolytic 
human gut ruminococcal species: Candidatus 
R. hominiciens, R. primaciens, and R. ruminiciens. 
Our evolutionary analysis strongly suggests 
that R. primaciens is the closest strain to the 
common ancestor of all human strains and that 
R. hominiciens likely originated in the rumi- 
nant gut and was later transferred to humans, 
possibly during domestication. Nevertheless, 
cospeciation cannot be ruled out at this time. 
These species underwent diversification and 
host adaptation in their respective gut ecosys- 
tems. Notably, host adaptation of these strains 
primarily occurs through gene acquisition from 
other members of the microbiome, as demon- 
strated by verticality analysis. 
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These species appear to be declining in the 
industrialized human gut. Nevertheless, com- 
prehensive understanding of their impact will 
be attained by future isolation of these strains 
and investigation of their physiology, fiber 
degradation potential, and effects on the host. 

The presence of these microbes in the human 
gut can offer significant benefits within the 
context of subsistence diets by maximizing 
nutrition from locally available foods in resource- 
limited societies, potentially providing energy 
through metabolic products. Indeed, these gut 
microbes are scarce in industrialized popula- 
tions but thrive in hunter-gatherer and rural 
communities where processed food consump- 
tion is minimal, and accompanied by a higher 
intake of natural, unprocessed plant fiber. Ad- 
ditionally, these microbes are highly preva- 
lent and abundant in primates and in 1000 
to 2000-year-old human gut samples, thus sug- 
gesting that they may have been an integral 
part of the ancestral human microbiome, con- 
sistent with a recent study that reported a 
higher prevalence of R. champanellensis in 
ancient and nonindustrialized human gut mi- 
crobiomes (25). 

Our research has revealed that these species 
continue to actively invade the human gut, 
as particularly evident in the case of strains of 
R. primaciens and R. ruminiciens. Although 
found in the human gut, their genomes appear 
to represent intermediates between primate- 
and rumen-gut ecosystems as they establish 
themselves in the human intestine, indicating that 
ruminants and NHPs may act as a source and 
reservoir for important cellulosome-producing 
ruminococcal strains, which continue to colo- 
nize and adapt to the human gut ecosystem. 
In this regard, a potential exists for their re- 
introduction or enrichment in the human 
gut through targeted diets and specialized 
probiotics. 


Materials and Methods 
Retrieval and analysis of ruminicoccal genomes 
containing cellulosomal elements 


The ScaC sequence from R. flavefaciens strain 
FD-1 (accession number CAK18894) was used 
as a query sequence to retrieve metagenome- 
assembled genomes (MAGs) of rumen and 
human origin (77, 18), using local blast (39). 
Hits below E-values of 10°“, above 45% of 
sequence identities and of lengths higher 
than 250 amino acids were retained. Among 
these, only associated MAGs with above 90% 
completeness as determined by CheckM (79) 
were analyzed further. ScaC sequences were 
aligned using MegaX (40). Annotation of gly- 
coside hydrolases in the selected genomes 
were performed with dbcan2 (41). The presence 
of the N-terminal sequence of the CttA pro- 
tein (21) (427 amino-acids, accession number 
CAK18897.1), which corresponds to the cellulose- 
binding component of the cellulosome system, 
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was used as a specific marker for R. flavefaciens 
strains using tblastx. 


Analysis of selected MAGs 


Dockerin and cohesin-containing sequences 
were retrieved from the predicted proteome 
[(using Prokka (42)] as detailed by Phitsuwan 
et al. (43). Annotation of dockerin-containing 
genes was performed using dbcan2. Mash 
analysis on the nucleotide level was performed 
on the genomes annotated using CttA as a 
marker (44). 


Prevalence of selected MAGs in rumen 
and gut samples 


At first, the 30 selected MAGs of rumen, 
human and NHP origin were aligned to their 
original sample reads (table S10). The number 
of reads were normalized between samples, 
and only alignments above 80% completion 
were retained. A heatmap of MAG abundances 
in the different samples was created, using the 
superheat package (https://CRAN.R-project. 
org/package=superheat). Then, to examine 
the prevalence of selected MAGs across gut 
samples from human and animals, we clus- 
tered the different MAGs that contained the 
CttA marker (Fig. 1B) based on 97% simila- 
rity, using the drep algorithm (45). This step 
resulted in 3 human and 8 rumen MAGs 
representing the three human gut species 
(R. primaciens, R. hominiciens and R. rumi- 
niciens) and various strains of R. flavefaciens. 
The MAGs were aligned to metagenomes from 
gut or rumen fecal samples (25, 27, 29, 30, 46-68). 
Samples with coverage of at least 20% for a 
given MAG at a threshold of 1 were con- 
sidered positive. To normalize the variation in 
read depth between metagenomes, each meta- 
genome was subsampled to 5, 10, 20, 40, and 
60 million reads and each MAG prevalence 
was assessed as stated previously. A cutoff of 
10 million reads was determined optimal 
for comparative analysis. Prevalence for R. 
champanellensis was calculated similarly 
by aligning the 18P13 genome to the same 
fecal samples. 


Evolutionary analysis of the selected MAGs 


Proteinortho (37) was used to group ortholo- 
gous proteins from human, rumen and NHPs 
genomes. For each of the 315 orthologous 
groups comprising the core genome shared be- 
tween the different host-associated strains, a 
phylogenetic tree was created using the mini- 
mal ancestor deviation (MAD) rooting approach 
(69). Moreover, we searched for orthologs in the 
genome of Clostridium thermobutyricum DSM 
4928 to serve as an outgroup. Outgroup ortho- 
logs were retrieved for 197 orthologous groups, 
and phylogenetic trees were created using the 
iqtree2 program package with 1000 bootstraps 
(70). We then performed an approximately un- 
biased (AU) analysis (77) on all core proteins for 
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which outgroup orthologs were available (197 
core proteins out 315) to test a cospeciation 
scenario. We used, as a hypothesis scenario, 
one of the core protein trees that exhibited a 
high and significant correlation to the mamma- 
lian host’s evolutionary tree, [using the den- 
dextend R package with cor.dendlist function 
(correlation of 0.67, P-value <0.001) (Fig. 3A)] 
(72). The AU test was performed as part of the 
iqtree2 program package (70) while using the 
‘au’ parameter as well as the ‘-zb 10,000’ 
parameter to indicate the number of RELL 
(73) replicates to perform several tree topology 
tests for all 197 core orthologous groups trees. 
We then performed an host/parasite cospeci- 
ation test [using the 'hommola_cospeciation' 
function from the 'skbio' python package (74)] 
similar to Sanders et al. (29) to identify core 
protein trees that exhibited similar host clus- 
tering as the mammalian host’s evolutionary 
tree [created using the Timetree database 
(75)]. We also used the Mantel test (using the 
‘mantel.rtest' function from the 'ade4' R pack- 
age), which yielded similar results to the hom- 
mola cospeciation test. We concatenated all 
the 197 core orthologous groups proteins and 
created a phylogenetic tree using iqtree2 pro- 
gram package with 1000 bootstraps (70). To 
examine whether R. primaciens is significant- 
ly closer to the most recent common ancestor 
of all strains identified in humans, we calculated 
the distance of each strain to the outgroup in 
the concatenated tree. We used the Wilcoxon 
rank sum exact test (two-sided) to test whether 
the distances of R. primaciens to the outgroup 
are smaller than the distances of all other 
human strains. All data and code are available 
in GitHub repository (76). 

To perform MLSA (77), amino-acid sequences 
of the subunit of RNA polymerase (rpoB), sub- 
unit of DNA gyrase (gyrB), translation ini- 
tiation factor IF-2 (infB), RNA modification 
GTPase ThdF or TrmE (thdF), chaperonin 
GroEL (groEL) and sigma 70 (sigma D) factor 
of RNA polymerase (rpoD) were retrieved 
from each of the 30 MAGs, aligned, con- 
catenated using MegaX (40) and a maximum 
likelihood phylogenetic tree was generated. 


Cloning of cellulosomal modules and enzymes 
from human strains 


Thirteen sequences of dockerins and cohesins 
were selected from the R. primaciens strain 
and synthesized by IDT (Coralville, Iowa, USA) 
with additions of restriction sites at both ends. 
The synthesized DNA sequences of cohesins 
and dockerins were inserted into CBM-Coh and 
Xyn-Doc plasmid cassettes respectively (32), 
using appropriate restriction endonucleases 
(Thermofisher Scientific). T4 ligase (New England 
Biolabs) was used for plasmid ligation and 
Escherichia coli strain DH5 alpha (Bio Lab, 
Israel) was used for transformation. Plasmids 
were verified by Sanger sequencing. 
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The sequence of a GH5 enzyme from R. 
primaciens strain was also synthesized by IDT 
and cloned into pET28a, using either restriction 
or restriction-free cloning. The N-terminal se- 
quence of the GH5 was reconstructed using the 
consensus sequence of highly similar GH5 se- 
quences, recovered by blastp (fig. S15). GH98 was 
cloned from metagenomic DNA extracted using 
the phenol-chloroform method (78) from a hu- 
man sample, in which the CttA gene was de- 
tected using specific primers for CttA (table S11), 
cleaved using NcoI and XhoI and inserted into 
restricted pET28a by ligation. The list of all 
primers used in this study is available in table 
S11. The amino-acid sequences of the proteins 
used in the study are available in table S5. 


Expression and purification of recombinant 
proteins and GH-containing dockerins 


The proteins were expressed and purified as 
described earlier (2) with incubation at 37°C for 
3 hours following induction with 0.2 mM iso- 
propyl B-D-1-thiogalactopyranoside (IPTG). The 
Xyn-Doc fusion proteins and GH-containing 
dockerins were purified using Ni-NTA beads 
(EMD, MERCK-Millipore) and CBM-Coh fusion 
proteins using amorphous cellulose (PASC). 


Affinity-based ELISA analysis of cohesins using 
immobilized dockerins 


The procedure of Barak et al., was followed 
(32). Cohesins and dockerins from R. champa- 
nellensis 18P13 and R. flavefaciens FD-1 for 
cross-species interactions were cloned and pro- 
duced as described earlier (10, 79). All binding 
affinity assays were performed at least twice in 
biological triplicates. 


Enzymatic activity assay 


Cellulolytic activity was tested with 0.5 uM 
of either GH5 from R. primaciens or from 
R. flavefaciens FD-1 (table S5) on 1% Avicel 
microcrystalline cellulose (FMC, Delaware USA) 
at pH 5 (50 mM acetate buffer, final concentra- 
tion) for 24, 48, and 72 hours at 37°C. Kinetics of 
amorphous cellulose degradation were followed 
by incubating the GH5 enzymes at concentra- 
tions ranging from 0 to 1 uM at pH5 for 1 hour at 
37°C with 7.5 g/1 substrate. After incubation, the 
tubes were centrifuged for 2 min at 14,000 rpm 
at room temperature, and 100 uL of supernatant 
fluids were added to 150 wL dinitrosalicylic acid 
(DNS) solution (80), boiled for 10 min, and the 
absorbance at 540 nm was measured. Released 
sugar concentrations were determined using a 
glucose standard curve. 
Glucuronoarabinoxylanase activity was tested 
by incubating 0.2% corn glucuronoarabino- 
xylan (38) in 20 mM citrate buffer (pH 6) with 
20 uL of either purified GH98, double-distilled 
water (ddw) or the lysate of a R. flavefaciens 
strain 17 culture, grown in M2 medium, supple- 
mented with 0.2% cellobiose, incubated over- 
night at 37°C. Two microliters of the reactions 


9 of 11 


RESEARCH | RESEARCH ARTICLE 


were spotted on TLC Silica gel 60F (Merck), and 
chromatography was carried out for 1.5 hours, 
using butanol:acetic acid: water 3:1:1 as a 
developing solvent. After drying the plate, spots 
were visualized by orcinol stain (5 g orcinol 
dissolved in 376.65 ml ethanol, 107 ml ddw 
and 16.15 ml sulfuric acid), and the silica plate 
was heated for 10 min at 70°C. 

All enzymatic assays were performed at least 
twice in biological triplicates. 


Cellulose binding assay 


Binding ability of CttA to cellulose was tested 
by the cellulose binding assay as described 
earlier (87). The CBM and cohesin-CBM3a from 
the CipA scaffodin of Clostridium thermocellum 
(81) were used as positive controls, and the GFP 
protein as a negative control for binding abili- 
ties. The binding assays was performed at least 
three times (biological replicates). 


Comparative genomics of selected human, 
rumen and NHP genomes 


Among the 5958 gene clusters obtained by 
Proteinortho, the 315 clusters common to the 
three groups were analyzed for verticality as 
well as clusters specific to one or two hosts. 
For verticality mapping, sequences were com- 
pared with the verticality values calculated by 
Nagies et al. (36). This was done by blasting all 
sequences in the database, which formed the 
basis for the clustering used in the latter report, 
against each sequence of interest. Results were 
filtered by an E-value of 107°, and sequences of 
interest were then mapped to the cluster with 
the highest number of hits. If the mapped 
cluster had a calculated verticality value, 
this value was then mapped to the sequence of 
interest. 

The presence-absence of the overall 5958 gene 
clusters, or number of annotated glycoside 
hydrolases (with and without dockerin modules) 
obtained using dbcan2, were compared among 
the three groups of selected genomes (human, 
rumen and NHP) using PCA plot in R with 
phyloseq (82) and ggplot2 (83), followed by 
the PERMANOVA test using 1000 randomiza- 
tions of the data and the vegan package (84). 
To highlight statistically different groups of 
GH, we performed a Kruskal-Wallis test, fol- 
lowed by false-discovery rate correction, and 
created abundance heatmaps for genes or tran- 
scripts, using the superheat package (https:// 
CRAN.R-project.org/package=superheat) (85). 


Expression of R. hominiciens genes in 
human samples 


RNA was extracted in 2 positive Israeli fecal 
samples, using the Qiagen AllPrep PowerFecal 
DNA/RNA Kit, and the samples which yielded 
high-quality RNA were sequenced by NovaSeq 
SP 2x150nt (Roy J. Carver Biotechnology Cen- 
ter, Illinois). Reads from sample 50466110 
from project PRJNA354235, which was found 
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positive in the MAG alignments, were also used. 
Reads from the metatranscriptomics of three 
macaque fecal samples (86) and three sheep 
rumen samples (59) were retrieved from the 
ENA database (macaque project SRX3517701- 
SRX3517724, samples SRR6425354, SRR6425396 
and SRR6425408 and sheep project PRJNA202380, 
samples SRR1206249, SRR1138694 and SRR1138697). 
Reads were subsampled to 1,000,000 reads, and 
transcripts were quantified using RSEM (87) 
against their respective MAGs Human_ 
SRR6028624_bin.16, Rumen_CACVSXO1 and 
Macaque_bin.22. The transcripts of the anno- 
tated GHs (with and without dockerin modules) 
obtained with dbcan2, were compared among 
the three groups of selected genomes (human, 
rumen and NHP) using PCA plot in R with 
phyloseq (82) and ggplot2 (83), followed by 
the PERMANOVA test using 1000 random- 
izations of the data and the vegan package (84). 
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